Consumer-level multimedia event detection through unsupervised audio signal modeling
نویسندگان
چکیده
In this work1, a novel acoustic characterization approach to multimedia event detection (MED) task for unconstrained and unstructured consumer-level videos through audio signal modeling is proposed. The key idea is to characterize the acoustic space of interest with a set of fundamental acoustic units around which a set of acoustic segment models (ASMs) is built. A vector space modeling technique to address MED is here adopted, where an incoming audio signal is first decoded into a sequence of acoustic segments. Then, a feature vector is generated by using co-occurrence statistics of acoustic units, and the MED final decision is implemented with a vector space language classifier. Experimental evidence on the TRECVID2011 MED demonstrates the viability of the proposed approach. Furthermore, it better accounts for temporal dependencies than previously proposed MFCC bag-of-word approaches.
منابع مشابه
Bag-of-Audio-Words Approach for Multimedia Event Classification
With the popularity of online multimedia videos, there has been much interest in recent years in acoustic event detection and classification for the improvement of online video search. The audio component of a video has the potential to contribute significantly to multimedia event classification. Recent research in audio document classification has drawn parallels to text and image document ret...
متن کاملMultimedia Event Detection using Visual Features
Learning spatial features from static images has traditionally involved approaches such as SIFT, HOG and SURF, to name a few. These approaches typically learn low-level hand-designed features which are difficult and time-consuming to extend to the video domain. Furthermore, recent research has shown that there is no universal set of hand-designed features for all datasets. Therefore, learning f...
متن کاملAudio self organized units for high-level event detection
High-level multimedia event detection aims to identify videos containing a target event. Recent approaches leveraging audio information for this task fall into two broad categories. The first corresponds to holistic bag-of-words approaches based on frame-level descriptors. These are effective for classification, but hard for humans to interpret. The second corresponds to approaches that build a...
متن کاملAudio-concept features and hidden Markov models for multimedia event detection
Multimedia event detection (MED) on user-generated content is the task of finding an event, e.g., a Flash mob or Attempting a bike trick, using its content characteristics. Recent research has focused on approaches that use semantically defined “concepts” trained with annotated audio clips. Using audio concepts allows us to show semantic evidence of their relationship to events, by looking at t...
متن کاملRobust audio-codebooks for large-scale event detection in consumer videos
In this paper we present our audio based system for detecting “events” within consumer videos (e.g. You Tube) and report our experiments on the TRECVID Multimedia Event Detection (MED) task and development data. Codebook or bag-of-words models have been widely used in text, visual and audio domains and form the state-of-the-art in MED tasks. The overall effectiveness of these models on such dat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012